Skip to content

[Data] Introduce RenameExpr to separate rename semantics from AliasExpr#60804

Open
slfan1989 wants to merge 3 commits intoray-project:masterfrom
slfan1989:data/introduce-rename-expr
Open

[Data] Introduce RenameExpr to separate rename semantics from AliasExpr#60804
slfan1989 wants to merge 3 commits intoray-project:masterfrom
slfan1989:data/introduce-rename-expr

Conversation

@slfan1989
Copy link
Contributor

@slfan1989 slfan1989 commented Feb 6, 2026

Description

This PR introduces a dedicated RenameExpr type to replace the _is_rename flag in AliasExpr, providing clearer semantics and better type safety for column renaming operations in Ray Data expressions.

Key Changes:

  • Add RenameExpr class for explicit column renaming semantics
  • Remove _is_rename flag from AliasExpr (now purely for computed columns/aliasing)
  • Update all 9 visitor implementations to handle RenameExpr
  • Update logical optimization rules for clearer rename vs alias handling
  • Add comprehensive tests for RenameExpr behavior and structural equality

Behavioral Changes:

  • col("a")._rename("b") now creates RenameExpr instead of AliasExpr
  • col("a").alias("b") remains AliasExpr (semantically correct for aliasing computed expressions)
  • Projection pushdown now correctly distinguishes rename from alias operations
  • Rename substitution with non-column expressions downgrades to alias
  • Rename substitution with aliased columns unwraps the alias and preserves rename semantics

Benefits:

  • Resolves the TODO in _ColumnSubstitutionVisitor
  • Makes rename semantics explicit in the type system
  • Enables future optimizations by separating rename and alias concerns
  • Improves code clarity and maintainability

Related issues

N/A

Additional information

This refactoring maintains backward compatibility while improving the internal representation. All existing tests pass, and new tests validate the RenameExpr behavior including structural equality, evaluation equivalence, and proper handling during column substitution.

This PR introduces a dedicated RenameExpr type to replace the _is_rename
flag in AliasExpr, providing clearer semantics and better type safety for
column renaming operations.

Changes:
- Add RenameExpr class for explicit column renaming semantics
- Remove _is_rename flag from AliasExpr (now purely for computed columns)
- Update all visitors to handle RenameExpr (9 visitor implementations)
- Update logical optimization rules for clearer rename vs alias handling
- Add comprehensive tests for RenameExpr behavior

Behavioral changes:
- col("a")._rename("b") now creates RenameExpr instead of AliasExpr
- col("a").alias("b") remains AliasExpr (semantically correct for aliasing)
- Projection pushdown now correctly distinguishes rename from alias operations

This resolves the TODO in _ColumnSubstitutionVisitor and enables future
optimizations by making rename semantics explicit in the type system.

Signed-off-by: slfan1989 <slfan1989@apache.org>
@slfan1989 slfan1989 requested a review from a team as a code owner February 6, 2026 08:58
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a dedicated RenameExpr to separate column renaming from aliasing computed columns, which was previously handled by AliasExpr with an internal flag. This is a significant improvement for code clarity and type safety. The changes are well-executed, with updates to all necessary expression visitors and logical optimization rules. The addition of comprehensive tests for RenameExpr ensures the new functionality is robust. I have one suggestion to improve the consistency of expression substitution logic.

Comment on lines 285 to 290
visited = self.visit(expr.expr)
if isinstance(visited, ColumnExpr):
return RenameExpr(
data_type=visited.data_type, expr=visited, _name=expr.name
)
return AliasExpr(data_type=visited.data_type, expr=visited, _name=expr.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with visit_alias, consider calling _unalias() on the visited expression. This would prevent creating nested aliases if the substitution itself results in an AliasExpr and would correctly preserve the RenameExpr type when possible.

For example, if col('a')._rename('b') is visited with a substitution {'a': col('c').alias('d')}:

  • The current implementation produces a nested alias: (col('c').alias('d')).alias('b').
  • With _unalias(), it would become col('c')._rename('b'), which seems more correct as the intermediate alias d is being renamed to b anyway.
Suggested change
visited = self.visit(expr.expr)
if isinstance(visited, ColumnExpr):
return RenameExpr(
data_type=visited.data_type, expr=visited, _name=expr.name
)
return AliasExpr(data_type=visited.data_type, expr=visited, _name=expr.name)
visited = self.visit(expr.expr)._unalias()
if isinstance(visited, ColumnExpr):
return RenameExpr(
data_type=visited.data_type, expr=visited, _name=expr.name
)
return AliasExpr(data_type=visited.data_type, expr=visited, _name=expr.name)

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

@ray-gardener ray-gardener bot added the community-contribution Contributed by the community label Feb 6, 2026
This PR introduces a dedicated RenameExpr type to replace the _is_rename
flag in AliasExpr, providing clearer semantics and better type safety for
column renaming operations.

Changes:
- Add RenameExpr class for explicit column renaming semantics
- Remove _is_rename flag from AliasExpr (now purely for computed columns)
- Update all visitors to handle RenameExpr (9 visitor implementations)
- Update logical optimization rules for clearer rename vs alias handling
- Add comprehensive tests for RenameExpr behavior

Behavioral changes:
- col("a")._rename("b") now creates RenameExpr instead of AliasExpr
- col("a").alias("b") remains AliasExpr (semantically correct for aliasing)
- Projection pushdown now correctly distinguishes rename from alias operations

This resolves the TODO in _ColumnSubstitutionVisitor and enables future
optimizations by making rename semantics explicit in the type system.

Signed-off-by: slfan1989 <slfan1989@apache.org>
This PR introduces a dedicated RenameExpr type to replace the _is_rename
flag in AliasExpr, providing clearer semantics and better type safety for
column renaming operations.

Changes:
- Add RenameExpr class for explicit column renaming semantics
- Remove _is_rename flag from AliasExpr (now purely for computed columns)
- Update all visitors to handle RenameExpr (9 visitor implementations)
- Update logical optimization rules for clearer rename vs alias handling
- Add comprehensive tests for RenameExpr behavior

Behavioral changes:
- col("a")._rename("b") now creates RenameExpr instead of AliasExpr
- col("a").alias("b") remains AliasExpr (semantically correct for aliasing)
- Projection pushdown now correctly distinguishes rename from alias operations

This resolves the TODO in _ColumnSubstitutionVisitor and enables future
optimizations by making rename semantics explicit in the type system.

Signed-off-by: slfan1989 <slfan1989@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant